This investigation aims to analyze Occupational Safety and Health Administration (OSHA) data in Arkansas to identify which industries are most frequently inspected and subsequently most unsafe.
The focus will be on determining which industries or companies have the most severe violations or complaints per business, with an emphasis on construction and poultry businesses due to the high rates of fatalities and injuries.
The construction and poultry industries are known for having high rates of severe injuries and deaths. This investigation will explore OSHA inspection data to identify patterns in workplace safety violations, particularly in cases where employers have routinely ignored OSHA regulations, leading to dangerous working conditions.
For instance, it is estimated that about half of poultry processing workers are Latino, half are women, and a quarter do not possess legal documents to work in the U.S., according to the National Center for Farmworker Health (NCFH). “Chicken catchers” may be more likely to be male, Latino, and undocumented.
This analysis will include data scraping, data cleaning, graphs, and an interactive visualization to present the findings. The story would have other media components, such as social media videos, a calculator to compare ‘how dangerous is your industry’, explanatory videos about the coverage, audio from workers’ stories, and photos of victims and their families.
The story pitches ahead will be based on data and case studies from the OSHA database. In the future, I hope to incorporate interviews from workers, workers’ advocacy groups, more data from state/federal agencies like BLS or NIOSH, and academic research into common hazards by these industries.
The project would serve to provide a data scraping tool, so that newsrooms in other states can replicate this reporting.
Data Scraping:
Data Cleaning:
Data Analysis:
#Loading the libraries
library(httr)
library(rvest)
library(tidyverse)
library(janitor)
library(readxl)
library(lubridate)
First, I tried scraped the OSHA website, but quickly I encountered issues I then I loaded the headers to bypass the ‘403’ authentication error when loading the osha.gov website. The header was written to convince the website that I was not a bot, but a person submitting a request and looking for the data.
By having the header defined, a 403 code no longer appeared, but a 304 request did. I then had to modify my header by changing the value of the cookie that the site was requesting, I just changed the value to equal ‘1’. Also I then defined the url with the OSHA query using two variables, State=Arkansas, Office=Little Rock, and I also was looking for Fed/State data, dating from 2024 back ten years worth of data.
# Define the headers
headers <- c(
"Accept" = "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.7",
"Accept-Encoding" = "gzip, deflate, br, zstd",
"Accept-Language" = "en-US,en;q=0.9",
"Cache-Control" = "max-age=0",
"Cookie" = "_gid=1",
"If-Modified-Since" = "Mon, 03 Jun 2024 13:39:50 GMT",
"If-None-Match" = "\"1717421990\"",
"Priority" = "u=0, i",
"Sec-Ch-Ua" = "\"Google Chrome\";v=\"125\", \"Chromium\";v=\"125\", \"Not.A/Brand\";v=\"24\"",
"Sec-Ch-Ua-Mobile" = "?0",
"Sec-Ch-Ua-Platform" = "\"macOS\"",
"Sec-Fetch-Dest" = "document",
"Sec-Fetch-Mode" = "navigate",
"Sec-Fetch-Site" = "none",
"Sec-Fetch-User" = "?1",
"Upgrade-Insecure-Requests" = "1",
"User-Agent" = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36"
)
# The url only displayed the first 20 results, but by modifying the url to change the violations of the page results to show ‘3000’ instead, I was able to access it.
# Define the URL
url <- "https://www.osha.gov/ords/imis/establishment.search?establishment=&state=AR&officetype=all&office=627100&sitezip=100000&startmonth=01&startday=01&startyear=2014&endmonth=06&endday=05&endyear=2024&p_case=all&p_violations_exist=both&p_start=&p_finish=20&p_sort=12&p_desc=DESC&p_direction=Next&p_show=3500"
# Make the GET request with headers
response <- GET(url, add_headers(.headers = headers))
# Check the status code
if (status_code(response) == 200) {
# If the request is successful, parse the HTML content
content <- content(response, as = "text")
webpage <- read_html(content)
# Extract all tables
tables <- webpage %>%
html_nodes("table") %>%
html_table(fill = TRUE)
# Then, because the scraper was only looking for the first table, I had to change the code to find the second table that had all of the relevant data, after inspecting the html code.
# Check if there are at least two tables
if (length(tables) >= 2) {
# Print the second table
cleaned_table <- tables[[2]]
print(tables[[2]])
} else {
print("Less than two tables found on the page.")
}
} else {
print("Failed to fetch the page.")
}
#fix names:
cleaned_table <- cleaned_table %>%
clean_names()
write.csv(cleaned_table,"osha_table.csv")
# Load the OSHA data
cleaned_table <- read.csv("osha_table.csv")
# Select relevant columns and clean names
relevant_osha <- cleaned_table %>%
select(date_opened, naics, establishment_name, type) %>%
mutate(naics1 = as.character(naics))
# Load the NAICS data
naics <- read_excel("/Users/rachellsanchez/Desktop/DJNF_Merrill/OSHA Project/NAICS_codes.xlsx") %>%
clean_names() %>%
rename(naics1 = x2022_naics_us_code, industry = x2022_naics_us_title)
# Join the OSHA data with the NAICS data
joined_table <- relevant_osha %>%
inner_join(naics, by = c("naics1" = "naics1"))
# Find the count of inspections per industry
inspection_counts <- joined_table %>%
count(industry) %>%
arrange(desc(n))
# Filter for Fat/Cat incidents
fatalities_industry <- relevant_osha %>%
filter(type == 'Fat/Cat') %>%
select(establishment_name, type, naics1)
# Join with NAICS data to get industry names
fatalities_industry <- fatalities_industry %>%
inner_join(naics, by = "naics1")
# Count Fat/Cat incidents per industry
fatcat_counts <- fatalities_industry %>%
count(industry) %>%
arrange(desc(n))
print(fatcat_counts)
# Save the Fat/Cat data to a CSV
write.csv(fatalities_industry, "fatalities_industry.csv")
# Poultry Processing Fat/Cat incidents by year
joined_table %>%
mutate(date1=mdy(date_opened)) %>%
mutate(year=year(date1)) %>%
filter(industry=="Poultry Processing") %>%
filter(type=="Fat/Cat") %>%
group_by(year) %>%
count(year)
# Electrical Contractors Fat/Cat incidents by year
joined_table %>%
mutate(date1=mdy(date_opened)) %>%
mutate(year=year(date1)) %>%
filter(industry=="Electrical Contractors and Other Wiring Installation Contractors") %>%
filter(type=="Fat/Cat") %>%
group_by(year) %>%
count(year)
# Highway, Street, and Bridge Construction Fat/Cat incidents by year
joined_table %>%
mutate(date1=mdy(date_opened)) %>%
mutate(year=year(date1)) %>%
filter(industry=="Highway, Street, and Bridge Construction") %>%
filter(type=="Fat/Cat") %>%
group_by(year) %>%
count(year)
Despite complete OSHA guidelines being established more than two decades ago, workplaces and industries are still failing to protect workers from severe injuries, accidents, hospitalizations, and even death. Almost all of the deaths seen in Arkansas were due to negligent behaviors and failed to be reported to the federal agency in due time.
Cross-referencing this list with the BLS for the last available data in 2022, the majority of fatal injuries were men representing 72 of the 75 deaths. The biggest age group was 35 to 44, though all age groups beginning with 25 to 65 were represented. Most were waged and salaried workers, representing 90% of the fatal injuries versus the 7 deaths who were self employed.
This project serves to expose the industries that are most likely to endanger workers due to pattern of violations and laxness around vital safety standards and regulations.
The analysis reveals the frequency and severity of workplace safety violations in Arkansas, a method that can be applied towards any state to identify and rank the most dangerous industries nationwide.
These companies are most investigated and inspected by OSHA in Arkansas. Here are the most dangerous ones.
These industries have the most deaths in Arkansas.
Workers are falling in Arkansas, no one is protecting them How much money in penalties have companies paid for because they are endangering workers?
How do Arkansas industries and injuries compares to other states?
In the future, I would like to collect data from other states, normalize it to have a comparable sample size using worker population, much like ArkansasCovid.com did, to be able to compare it with other states. Plus use other databases like BLS to confirm data and provide more in depth demogrpahic information.
Rachell Sanchez-Smith | rs069@uark.edu | (479) 935-0882